Back

European Radiology

Springer Science and Business Media LLC

Preprints posted in the last 30 days, ranked by how well they match European Radiology's content profile, based on 14 papers previously published here. The average preprint has a 0.04% match score for this journal, so anything above that is already an above-average fit.

1
Economic costing of evaluating, deploying and monitoring an artificial intelligence-based reconstruction for acceleration of rectal MRI examinations

Harrison, C. A.; Wu, M.; White, O.; Hopkinson, G.; Hughes, J.; Robertson, S.; Scurr, E.; Shur, J.; Castagnoli, F.; Charles-Edwards, G.; Koh, D.-M.; Winfield, J.

2026-05-21 radiology and imaging 10.64898/2026.05.18.26353474 medRxiv
Top 0.1%
33.9%
Show abstract

Objectives: AI-based reconstructions can reduce MRI acquisition times and/or improve image quality. Guidelines recommend clinical evaluations and post-deployment monitoring of these novel methods, however, there has been little investigation of the clinical resources required for such assessments. The aim of this study was to evaluate the healthcare resource utilisation and potential savings associated with AI-based reconstructions in rectal MRI. Methods: A retrospective economic costing analysis was conducted from the NHS healthcare perspective. Resource utilisation data were extracted from the Electronic Patient Records for 9 healthy volunteer scans and 104 rectal MRI examinations evaluating an AI-based reconstruction. The resource profile included the MRI scan and the staff time required for data acquisition and analysis. Results: The clinical evaluation of the AI-based reconstruction cost {pound}15,023. Deployment of the AI-based reconstruction reduced the length of an MRI rectum scan by 22 minutes, theoretically saving approximately {pound}3,437 per month. Addition of post-deployment quality control scans reduced this monthly saving to {pound}2,636. If the quality control scans were evaluated using radiologists rather than image quality metrics, monthly savings would be approximately {pound}2,541. With ongoing quality control, the clinical evaluation cost would be recouped between 5.8 and 6 months, compared with 4.4 months without ongoing quality control. Conclusions: Deploying AI-based reconstructions can yield cost savings through reduced scanning times. Quality control tests using image quality metrics would save radiological burden and reduce costs compared with conducting repeated image scoring by radiologists.

2
Performance of Vision-Language Models for Zero-Shot Lung Nodule Detection on Chest Radiographs

Nishio, M.; Matsuo, H.; Matsunaga, T.; Fujimoto, K.; Deperrois, N.; Nooralahzadeh, F.; Frauenfelder, T.; Krauthammer, M.; Murakami, T.

2026-06-03 radiology and imaging 10.64898/2026.05.31.26354565 medRxiv
Top 0.1%
23.4%
Show abstract

Background and Objectives: The ability of vision-language models (VLMs) to detect lung nodules on chest radiographs remains uncertain. This retrospective study aimed to compare the zero-shot performances of six VLMs for lung nodule detection using data from the Japanese Society of Radiological Technology (JSRT) chest radiograph database. Methods: A total of 247 chest radiographs from the JSRT database (154 with nodules and 93 without) were preprocessed and evaluated using six VLMs: RadVLM, gpt-4o-mini, Qwen3-VL-8B-Instruct, MedGemma-4b-it, LLaVA-Rad, and CheXpert Plus Model. Each model was tested using a zero-shot setting. The text outputs were binarized into nodule-present or nodule-absent labels by consensus between the two radiologists. Sensitivity, specificity, accuracy, precision, and F1 scores were calculated. Pairwise differences in sensitivity, specificity, and accuracy were assessed using McNemar test with Holm correction. Results: The overall performance was limited across all models. RadVLM achieved the highest accuracy (44.5%, 110/247) with perfect specificity (100.0%, 93/93) and precision (100.0%); however, its sensitivity was low (11.0%, 17/154). LLaVA-Rad showed the highest sensitivity (27.3%, 42/154) and F1 score (37.7%), but lower specificity (71.0%, 66/93). MedGemma-4b-it achieved 100.0% specificity, with a sensitivity of only 5.2% (8/154). Grade-specific analysis showed that detection rates were highest for obvious nodules and remained limited for subtle nodules. Pairwise analyses revealed significant differences in sensitivity and specificity for the selected model pairs, particularly between RadVLM and LLaVA-Rad. Conclusion: Current VLMs show limited zero-shot generalizability for lung nodule detection in the JSRT database, with marked trade-offs between sensitivity and specificity. Their near-term value may lie more in radiologist-assisted workflows than in stand-alone detection. Clinical Impact: Current VLMs should not be used as stand-alone tools for lung nodule detection on chest radiographs because of their limited sensitivity and substantial model-dependent trade-offs. However, their high-specificity outputs in some models and higher-sensitivity behavior in others suggest potential roles in radiologist-assisted workflows, such as report drafting and second-reader support.

3
Efficacy Validation of a Novel MRI-Based Whole-Body Rapid Bone Scan (WB-RBS) Strategy for Diagnosing Bone Metastases: A Prospective Trial

Wu, X.; Zhang, J.; He, Y.; Zhang, Y.; Kang, X.; Hu, W.; Li, Y.; Ma, H.; Wang, Y.; Song, Y.; Chen, X.; Huo, F.; Zhang, Y.; Yin, H.; Xi, Y.

2026-05-24 radiology and imaging 10.64898/2026.05.17.26352855 medRxiv
Top 0.1%
22.2%
Show abstract

Background: Traditional bone scintigraphy for detecting malignant bone metastases is limited by suboptimal accuracy and radiation exposure. Whole-body magnetic resonance imaging (WB-MRI), while an alternative, requires lengthy scan times and high patient compliance. Purpose: To develop a novel, rapid whole body bone screening (WB-RBS) MRI protocol and evaluate its diagnostic performance for bone metastasis detection. Materials and Methods: Patients with pathologically confirmed malignancies and healthy controls were prospectively enrolled. All participants underwent WB-RBS (acquisition time: about 10 min); patients additionally underwent WB-MRI (about 70 min). Three radiologists, blinded to clinical data, independently evaluated the images for bone metastases. A consensus expert diagnosis served as the reference standard to calculate the diagnostic performance of WB-RBS. Specificity was further assessed in the healthy control group. Results: Seventy patients and 19 healthy controls were included. WB-RBS demonstrated excellent inter-reader agreement at the patient level. Compared with the reference standard, WB-RBS achieved an accuracy of 77.1%-91.4% at the patient level and a slightly lower accuracy (70.6%-82.5%) at the lesion level. At diagnostic confidence thresholds 1-3, the correlations between WB-RBS ratings and the reference standard were statistically significant for both patient- and lesion-level analyses. Conclusion: WB-RBS showed favorable inter-reader agreement and high accuracy for bone metastasis screening at the patient level, while substantially reducing scan time and cost. Its rapid, radiation-free nature and high accessibility offer distinct clinical advantages, supporting its potential as an alternative screening tool to conventional bone scintigraphy.

4
Adoption of Guided Structured Reporting in Routine Radiological Practice: A Six-Week Multi-Site Implementation Study in the UAE

Lorenz, D.; Jansen, S.; Knoche, J.; Wolf-Sebottendorff, R.; Awad, H. J.; Toker, I.

2026-05-22 radiology and imaging 10.64898/2026.05.20.26353646 medRxiv
Top 0.1%
20.5%
Show abstract

Background. Guided structured reporting has been proposed to address the limited availability of structured data in radiology, yet empirical evidence on its real-world adoption across users and imaging modalities remains scarce. Objective. To describe the adoption dynamics of a guided structured reporting system across multiple users and imaging modalities during a six-week implementation period. Methods. Retrospective observational study at two public tertiary hospitals in Abu Dhabi, United Arab Emirates. A guided structured reporting system was deployed for computed tomography (CT), magnetic resonance imaging (MRI), and mammography. Seven radiologists participated. The primary outcome was active in-software reporting time, recorded via system logs of mouse and keyboard interaction. Temporal trends in median reporting time per modality and individual user trajectories were analysed descriptively. After predefined data cleaning, 126 reports were included (84 CT, 27 MRI, 15 mammography). Results. Active in-software reporting time decreased across all modalities. Median reporting time fell from 130 s to 56 s for CT, from 383 s to 60 s for MRI, and from 126 s to 46 s for mammography (week 1 to week 6). Individual trajectories showed similar patterns, with the largest reductions during the early implementation phase. Subgroup analyses were limited by small sample sizes. Conclusions. Guided structured reporting was integrated into routine clinical workflows with temporal reductions in active reporting time across users and modalities, providing empirical evidence on the feasibility of workflow-integrated structured reporting in radiological practice.

5
An Experimental Investigation of the Relationship between AI-Human Workflow Design and Legal Liability for Radiologists: The Erroneous-Change Penalty and Omission Bias

Song, E. C.; Bernstein, M. H.; Sheppard, B.; Bruno, M. A.; Baird, G. L.

2026-05-22 radiology and imaging 10.64898/2026.05.20.26353717 medRxiv
Top 0.1%
14.7%
Show abstract

Background: With growing impetus to integrate artificial intelligence (AI) tools into radiology, clinical practices must navigate workflow redesign. This carries implications for medical malpractice liability. Methods: We conducted an online vignette experiment with United States adults who acted as hypothetical jurors in a malpractice case involving a missed intracranial hemorrhage. Participants (n=2,347) were randomized to one of 22 conditions: a no-AI control and 21 conditions involving a hypothetical AI system. These twenty-one conditions varied by whether (1) a single-read or double-read workflow was used, (2) the radiologist's initial interpretation was documented, (3) the radiologist changed their interpretation after viewing AI output, (4) the AI detected the abnormality, and (5) the AI error rate--False Discovery Rate (FDR) or False Omission Rate (FOR--was provided to participants only, both participants and radiologist, or neither. The primary outcome was perceived liability, assessed by whether the radiologist met their duty of care. Findings: Perceived liability differed across conditions (p<0.0001). Double-read workflows (p<0.0001), documenting initial interpretations (p=0.0125), and providing participants with AI error rates, including the FDR (p=0.0038) or FOR (p=0.0035), reduced perceived liability. Liability was also lower when AI was incorrect (p<0.0001). Radiologists' awareness of AI error rates did not significantly impact liability. Notably, we observed an erroneous change penalty: the greatest liability occurred when radiologists initially identified an abnormality but later changed their interpretation to normal after seeing that AI identified the case as normal; conversely, perceived liability was lowest with documented, double-read workflows. Interpretation: Double-read workflows with documented initial interpretations and disclosure of AI error rates reduce perceived liability, though changing a correct initial interpretation increases it. Strategic workflow design is critical for successful AI implementation that can mitigate malpractice risk.

6
Consensus-based technical recommendations for clinical translation of renal Dynamic Contrast-Enhanced (DCE) MRI

Gunwhy, E. R.; Kurugol, S.; Serai, S.; van der Molen, A. J.; Abou El-Ghar, M.; Buckley, D. L.; Hockings, P. D.; Jones, R. A.; Lim, R. P.; Mendichovszky, I. A.; Pedersen, M.; Reynolds, H. M.; Sanmiguel Serpa, L. C.; Wentland, A.; Zoellner, F. G.; Sourbron, S.; Dekkers, I. A.

2026-05-14 radiology and imaging 10.64898/2026.05.11.26352525 medRxiv
Top 0.1%
14.4%
Show abstract

BackgroundDynamic contrast-enhanced (DCE) MRI has the potential to be a useful tool for non-invasively assessing renal haemodynamics and function, however insufficient standardisation and difficulties in post-processing remain barriers to clinical translation. PurposeTo develop expert consensus-based technical recommendations for performing renal DCE-MRI in humans, relating to aspects of patient preparation, MRI hardware and acquisition parameters, and data analysis. Study TypeSystematic consensus process using an approximation to the two-step modified Delphi method. PopulationNot applicable. Field Strength / Sequence1.5 T and 3 T / Renal gradient echo-based 3D DCE-MRI. AssessmentAn international panel of experts were recruited and surveyed following a modified Delphi method to create consensus-based technical recommendations. Key areas for consensus were initially identified through a mixture of online and in-person discussions, and an initial survey round consisting of open- and close-ended questions. Consensus statements were formulated and iteratively refined to create the final recommendations. Statistical TestsConsensus was defined as [&ge;] 75% agreement in response (excluding abstentions), and clear preference was defined as [60-74]% agreement among the experts. Statements with [&ge;]40% abstentions were either excluded from subsequent survey rounds or recirculated as a modified statement. Results22 experts initially participated in the Delphi panel, of which 16 responded to the first survey. 15 panellists responded to all subsequent surveys. Out of 46 statements, 37 reached consensus and one showed clear preference. [&ge;]40% abstention was found in seven statements which were excluded from the final set of recommendations. Data conclusionThese recommendations provide a starting point for MRI centres worldwide wishing to perform renal DCE-MRI, contributing to the harmonisation of DCE-MRI scan protocols and facilitating clinical translation. These recommendations provide a practical minimum technical dataset for renal DCE-MRI acquisition and analysis to improve cross-site comparability and support responsible clinical translation.

7
AI-Based Coronary Artery Calcification on Non-contrast CT: Performance Across Calcium Scoring, Lung Cancer Screening, and Liver Transplant Candidate Cohorts

Ludwig, K. D.; Hatt, C. R.; Keith, L.; Matyga, A. W.; Te, H. S.; Landeras, L.; Chelala, L.; Patel, A. R.; Chung, J. H.

2026-05-15 radiology and imaging 10.64898/2026.05.12.26352904 medRxiv
Top 0.1%
13.1%
Show abstract

Objective: Coronary artery calcification (CAC) assessment for cardiovascular risk stratification is traditionally achieved using ECG-gated computed tomography (CT). Automated deep-learning (DL) algorithms may streamline opportunistic CAC detection and scoring, particularly on non-gated CT scans. This study evaluated the performance of a fully automated DL-based CAC scoring algorithm ("DL-CAC") against expert human scoring. Methods: The algorithm was trained on 1,260 chest CT scans from multiple databases to automatically identify coronary calcium, calculate Agatston scores, and assign a cardiovascular disease (CVD) risk classification. Performance was assessed on a holdout dataset (n=500) comprising ECG-gated calcium scoring CT scans and lung cancer screening non-gated chest CTs as well as in an external, independent CT dataset (n=129) from liver transplant candidates. Agreement with expert scoring was assessed using intraclass correlation coefficient (ICC) for Agatston scores and Cohen's {kappa} for CVD risk classification. Results: The algorithm demonstrated high agreement with expert scoring in the pooled calcium scoring and lung cancer screening cohorts, with an ICC of 0.947 for Agatston scores and {kappa} of 0.936 for CVD risk classification. For liver transplant candidates, the algorithm exhibited substantial agreement with expert scoring of non-gated CT scans ({kappa}=0.79) and a sensitivity of 90.4% and specificity of 96.4% in high-risk cases. Conclusion: These findings suggest that DL-based CAC scoring on non-gated CT scans may be a feasible alternative to traditional methods and could support opportunistic cardiovascular risk assessment in routine imaging. Further validation is warranted to assess clinical integration in broader practice settings.

8
Conus Medullaris Position in 9,808 Pediatric Lumbosacral MRI Examinations: A Large-Cohort Reference Distribution and the Normally Positioned Conus in Surgically Treated Tethered Cord

Tang, W.; Dong, Y.; Chen, J.; Yang, Y.; Huang, H.; Yu, M.; Zhu, J.; Shen, G.

2026-06-08 radiology and imaging 10.64898/2026.06.06.26355031 medRxiv
Top 0.1%
12.6%
Show abstract

Background. Tethered cord syndrome (TCS) is classically associated with a low-lying conus medullaris, yet many surgically treated children have a normally positioned conus (occult TCS). Large-scale normative data on conus position in children, and the diagnostic value of quantitative conus assessment, are limited. Purpose. To establish a large-cohort reference distribution for conus medullaris termination level in children, to quantify conus position in children surgically treated for presumed (occult) TCS, and to test whether automated conus segmentation and radiomics can distinguish TCS from normal. Materials and Methods. In this retrospective single-center study, conus termination level was extracted from structured radiology reports of consecutive pediatric lumbosacral MRI examinations and encoded numerically (L1 = 1, L2 = 2, etc.). Children surgically treated for tethered cord were identified by linkage to an operative registry (name and date of birth) and restricted to preoperative examinations. A deep-learning model (nnU-Net) was trained for conus segmentation on axial T2-weighted images. IBSI-compliant radiomic features were extracted; reproducibility was assessed by intra- and inter-observer intraclass correlation (ICC). A case-control radiomics analysis used batch-only ComBat harmonization and cross-validated L1-penalized logistic regression; discrimination was compared with conus level by paired bootstrap. Results. Among 9,808 examinations with a parseable conus level (98.5% of reports; parser validated against dual blinded annotation, 99.4% agreement, weighted kappa 0.946), the conus terminated in the L1 region in 85.7% and the L2 region in 14.3% of the reference cohort (postoperative examinations excluded, n = 9,655); a low-lying conus (>=L3) occurred in only 0.05% (5/9,655), and remained rare (0.14%, 14/9,808) including operated examinations (median L1; mean 1.13 +/- 0.33). A slightly more cephalad position was seen with increasing age (negligible correlation). Among 475 preoperative children surgically treated for tethered cord, 99.6% had a normally positioned conus (<=L2) and only 0.4% were low-lying. Automated conus segmentation achieved a held-out Dice of 0.85. Conus radiomics likewise did not distinguish TCS from controls (equivalence-tested null; full segmentation/radiomics pipeline reported in the companion methodological paper). Conclusion. In children, the conus medullaris terminates at L1-L2 in more than 99% of cases and is normally positioned in virtually all children surgically treated for TCS. Within the conus, neither position nor texture (radiomics) identifies tethered cord; whether the filum terminale carries a diagnostic signal was not tested here.

9
Automated Anatomy-Based Subsegmentation of Pelvic and Proximal Femoral CT: Validation Across Clinically Relevant Regions and Landmarks

Rashed, M.; Alabdulrahman, H.

2026-05-19 radiology and imaging 10.64898/2026.05.14.26353237 medRxiv
Top 0.1%
10.6%
Show abstract

Background Automated pelvic CT segmentation has advanced to reliable coarse bone extraction. Yet the structured anatomical hierarchy required for morphometry, fixation planning, bone quality mapping, and arthroplasty workflows remains unachieved. This study developed and validated a fully automated anatomy-informed pipeline that converts standard pelvic CT into a comprehensive, surgeon-readable subsegmentation of the pelvis and proximal femur. Methods Pelvic CT datasets were retrospectively collected from anonymized archives of hospitals affiliated with the Directorate of Health Affairs, Sharqia, Egypt. After eligibility screening, 757 normal adult cases were processed using a custom one-click 3D Slicer pipeline integrating TotalSegmentator for coarse extraction, followed by deterministic anatomy-based subsegmentation into 81 segments. One hundred randomly selected cases were validated against expert-corrected reference segmentations using Dice similarity coefficient, volume difference, surface distance metrics, and bilateral symmetry analysis. Results Of 1,316 screened cases, 757 met eligibility criteria. Across 8,100 case-segment observations, the pipeline achieved a mean Dice of 0.9926 +/- 0.0465. Complete agreement was observed for the sacrum, ilium, acetabulum, anterior and posterior columns, sciatic buttress, and all landmarks. Relative decreases were confined to boundary-dependent regions. Bilateral symmetry analysis confirmed a median surface agreement of 99.85% within 5 mm. Conclusion The pipeline demonstrated high accuracy and reproducibility across a large normal adult dataset, establishing a structured anatomical foundation for quantitative pelvic analysis and surgical planning workflows. Clinical feasibility across abnormal anatomy and decision-level applications awaits dedicated validation.

10
Automated Segmentation of Cerebral Arteries on Three-Dimensional Rotational Angiography Using nnUNet v2: Prospective Validation with Quantitative Metrics and Expert Qualitative Assessment

Hofmeister, J.; Brina, O.; Rosi, A.; Bernava, G.; Reymond, P.; Muster, M.; Lovblad, K.-O.; Machi, P.

2026-05-26 radiology and imaging 10.64898/2026.05.20.26353640 medRxiv
Top 0.1%
10.3%
Show abstract

Background: Three-dimensional visualization and quantitative analysis of cerebral arteries on 3DRA are central to endovascular treatment planning, device selection, and cerebrovascular research. Manual segmentation is time-consuming and operator-dependent, yet no open-source deep learning model has been prospectively validated for this task on 3DRA. Methods: A nnUNet v2 model was trained for binary cerebral artery segmentation on 400 consecutive 3DRA acquisitions from three angiographic systems, comparing four configurations across architectures and loss functions. The best-performing configurations were prospectively validated on 40 patients using a dual approach: quantitative metrics (DSC, clDice, HD95, ASD, Precision, Recall), and blinded expert qualitative evaluation by two interventional neuroradiologists assessing 12 arterial segments, a global quality score, and clinical usability across 40 test cases. Results: The ensemble model achieved median DSC 0.917, clDice 0.932, and HD95 1.494 mm. Global quality scores were significantly lower for nnUNet v2 than for expert segmentations (median 4 vs 5, p<0.001), but nnUNet v2 segmentations were rated clinically usable in 88-90% of cases versus 95-98% for expert segmentations, without significant difference on the binary usability criterion. A consistent proximal-to-distal quality gradient was identified, with comparable scores at proximal arteries and the largest differences at distal arterial segments. Conclusion: nnUNet v2 with topology-aware training provides clinically usable cerebral artery segmentations on 3DRA, prospectively validated through both quantitative metrics and structured expert qualitative assessment, and represents a reproducible open-source foundation for endovascular and research applications.

11
Agreement of an AI tool for joint space width measurement in radiographic knee osteoarthritis: data from the LOSEIT trial

Mayar, S.; Henriksen, M.; Christensen, R.; Hansen, P.; Bliddal, H.; Nybing, J. U.; Nielsen, C. T.; Gudbergsen, H.; Boesen, M. P.; Brejnbol, M. W.

2026-06-12 radiology and imaging 10.64898/2026.06.11.26355242 medRxiv
Top 0.1%
7.0%
Show abstract

Background and rationale: Knee osteoarthritis (KOA) is a leading cause of lower limb disability worldwide, characterized by functional limitations, stiffness and pain. The incidence of KOA is especially tied to age and obesity. It is a disabling disease that often makes patients less physically active, thus increasing the risk of other diseases and mortality1. The clinical diagnosis of KOA is based on the symptoms and functional limitations of the joint. The diagnosis is usually supported with a radiograph (X-ray) of the weight-bearing knee. Radiographic features, such as Kellgren-Lawrence grade, are used as eligibility criteria for clinical studies while other features, such as joint space width (JSW), are used as endpoints for structural KOA progression2,3. While the use of these radiographic features is standard in academia, the use of JSW as a structural biomarker has received criticism. Critics point out that JSW is an indirect and projection dependent measure of cartilage deterioration which is sensitive to technical factors such as the angulation of the X-ray beam and the positioning of the knee. Small differences in these factors can alter the measured joint space and may not reflect true disease progression4,5. Despite limitations, minimum joint space width (mJSW) remains as one of the most widely used structural biomarkers in KOA trials and is currently one of the only structural imaging accepted in regulatory guidance as evidence of disease modification in OA drug development3. For JSW to be reliable and consistent in determining the advancement of KOA, the use of fixed-flexion devices is crucial to reduce the risk of unwanted narrowing or widening of the radiographic joint space width6,7. The LOSEIT trial, which the present study is based on, acknowledges the angulation problem and uses a standard clinical fixed-flexion device in weight-bearing PA views to get reliable JSW results8. Historically, a radiologist would draw on and grade radiographs of the knee-joint to extract the features. However, manual reading and annotation is time consuming with notable interobserver variance9. With increasing computational power and the use of deep neural networks, off-the-shelf artificial intelligence (AI) tools have become available for automatic extraction of radiograph features. Automation would free up time from radiologists and provide more consistent measurements due to the reproducible nature of the models10. These tools have received regulatory approval for commercial use, however, regulatory approval does not guarantee uniform or bias free performance when used on real-world data11. Furthermore, in a large multi-hospital chest X-ray study, Zech et al., showed that convolutional neural networks achieved worse results on data from other hospitals than on the original hospitals in which it was tested12. This highlights the risk of overestimating the accuracy of AI tools when only internally validated. It is therefore apparent that external validation is required when testing these AI models. Objectives: The aim of this analysis is to evaluate the agreement of a commercially available AI tool for measuring JSW with the best practice radiologist annotation in the tibiofemoral joint of the knee in radiographs stabilized with a fixed-flexion device and acquired as part of a clinical trial. Methods: This study is a secondary analysis of the data from the LOSEIT trial, a randomized, double-blind, placebo-controlled, single-center trial, where patients were randomized to either liraglutide or identically appearing placebo after an initial weight-loss period to investigate the effects on KOA. Radiographs of the tibiofemoral joint were acquired at enrollment (week -8) and at end-of-trial (week 52) for a total acquisition-to-acquisition time of 60 weeks13. The primary analysis will assess agreement between AI-derived and reference-derived change in JSW from enrolment to follow-up. Change will be calculated as follow-up minus enrolment separately for the AI tool and the reference measurement. The main measure of interest will be the change in medial minimal JSW (mmJSW), with change in lateral minimal JSW (lmJSW), medial fixed JSW (mfJSW) and lateral fixed JSW (lfJSW) as secondary measures. This study will follow an equivalence framework using the two one-sided tests (TOST) approach with a Bland-Altman analysis as the main outcome. The equivalence margin will be set at {delta} = 0.5 mm. Agreement consistent with equivalence will be considered established if the upper limit of the 95% confidence interval (95% CI) for the upper limit of agreement (LoA) and the lower limit of the 95% CI for the lower LoA are within the established margins. The reference JSW will be the average measurement of two independent resident radiologists. If there is a mismatch in the measurements of more than 0.40 mm between the two radiologists, the radiologists will re-annotate the case independently. If the difference remains greater than 0.40 mm, a musculoskeletal radiology consultant will review the radiograph and establish the reference JSW. The index test will be the measurements output by the AI tool. Populations: Patients aged 18 to 74 with symptomatic knee osteoarthritis, radiographically confirmed KL grade 1-3, with a BMI [&ge;]27, motivated for weight loss and in accordance with the LOSEIT trial inclusion criteria Further statistical details Sample size: Not applicable as this is a secondary analysis. Framework: This is an agreement study assessing the equivalence of a commercially available AI tool for radiographic evaluation of knee osteoarthritis with best practice radiologist measurements. Confidence intervals and P values: All 95% confidence intervals and P-values will be two-sided. Statistical software: SAS Studio and/or R version 4.2.2 (or newer).

12
Assessment of the accuracy of lung lesions diagnosis in adolescents with osteosarcoma using artificial intelligence

Uskova, N. G.; Gombolevskiy, V. A.; Chernina, V. Y.; Burenchev, D. V.; Akhaladze, D. G.; Panina, E. V.; Karachunskiy, A. I.; Tereschenko, G. V.; Goncharov, M. Y.; Soboleva, E. A.; Konopleva, E. I.; Bydanov, O. I.; Plekhov, S. Y.; Grachev, N. S.

2026-06-10 radiology and imaging 10.64898/2026.06.08.26354011 medRxiv
Top 0.1%
6.9%
Show abstract

Background. Lung metastases in osteosarcoma (OS) are the main cause of the death. The accuracy of the diagnosis of nodules by computed tomography (CT) of the lungs is critically important for determining the disseminated stage of the disease and planning surgical treatment. The use of artificial intelligence (AI) in the search for lung nodules increases the accuracy of diagnosis and reduces the chance of missing metastases. Objective: to evaluate the accuracy of lung nodules diagnosis in adolescents with OS using AI. Methods. A retrospective assessment of CT scans of adolescents with OS was performed. A pathological nodule with an average size of [&ge;]4 mm was considered a target finding. The diagnostic accuracy of an AI algorithm previously trained on an adult dataset was evaluated, and the number of false positives (FP) and false negatives (FN) was determined. Sensitivity, specificity, accuracy, area under the ROC curve (AUC), positive predictive value, negative predictive value, and F1-measure were calculated. Based on the obtained results, the effectiveness of the algorithm was assessed. Results. 248 CT scans of adolescents with OS were evaluated. The following results were obtained: in 5 cases, the AI algorithm showed a FP result (2.02%), in 34 cases, it showed a FN result (13.71%), and in 209 cases, a correct result (both true positive and true negative) (84.27%). The diagnostic accuracy of the algorithm was 0.843 (95% CI 0.794-0.887). The application of the AI algorithm in the practice of an X-ray doctor in a specific clinical task would allow to increase the sensitivity from 0.805 to 0.891, while ensuring an absolute decrease in the number of FN results by 8.59% and a relative decrease by 44%. Conclusion. The obtained results confirm the practical value of the application of the AI algorithm and justify the implementation of AI-assisted systems in the diagnostic protocols for lung metastases in adolescents with OS.

13
DKK1 and CKAP4 expression is associated with cervical lymph node metastasis in tongue squamous cell carcinoma

Fujita, H.; Takahashi, O.; Yada, N.; Tanaka, J.; Haraguchi, K.; Morioka, M.; Yaginuma, T.; Sasaguri, M.; Kokabu, S.; Habu, M.

2026-06-01 dentistry and oral medicine 10.64898/2026.05.29.26354440 medRxiv
Top 0.1%
5.1%
Show abstract

Objective: To identify Dickkopf-1 (DKK1) as a prognostically relevant candidate in head and neck squamous cell carcinoma and to evaluate whether DKK1 and cytoskeleton-associated protein 4 (CKAP4) expression is associated with cervical lymph node metastasis in tongue squamous cell carcinoma (TSCC). Methods: DKK1 was screened using the Human Protein Atlas Pathology Atlas. Immunohistochemical expression of DKK1 and CKAP4 was examined in 54 patients with primary TSCC (cT1-4N0) treated surgically between 2015 and 2020. Nine cases were excluded because of insufficient tissue blocks or inadequate staining quality, leaving 45 evaluable cases. Associations with delayed cervical lymph node metastasis were assessed together with conventional clinicopathological factors, including infiltrative growth pattern (INF) and pathological depth of invasion (pDOI). Results: In public database analysis, high DKK1 expression was associated with poorer overall survival in head and neck squamous cell carcinoma. In the TSCC cohort, pDOI [&ge;]5 mm and INF pattern c were significantly associated with cervical lymph node metastasis. Positive DKK1 and CKAP4 expression were also significantly associated with cervical lymph node metastasis. Furthermore, combined DKK1/CKAP4 positivity, when incorporated with INF and pDOI, provided additional risk stratification, and cases with all 3 factors showed a markedly increased likelihood of cervical lymph node metastasis. Conclusions: Expression of DKK1 and CKAP4 was associated with cervical lymph node metastasis in TSCC. Combined assessment of DKK1/CKAP4 expression with INF and pDOI may improve pathological risk stratification and may help identify patients who require closer neck evaluation and postoperative management.

14
Opportunistic CT Attenuation Biomarkers of Anemia Are Associated With Impaired Myocardial Flow Reserve and Cardiovascular Outcomes

Miller, R. J.; Shanbhag, A.; Yi, J.; Kwiecinski, J.; Kavanagh, P.; Ramirez, G.; Lemley, M.; Kamagate, A.; Slipczuk, L.; Travin, M. I.; Alexanderson, E.; Carvajal-Juarez, I.; Packard, R. R. S.; Al-Mallah, M.; Einstein, A. J.; Acampa, W.; Knight, S.; Le, V. T.; Mason, S.; Wopperer, S.; Chareonthaitawee, P.; Rosamond, T. L.; DeKemp, R. A.; Buechel, R. R.; Berman, D. S.; Dey, D.; Di Carli, M. F.; Slomka, P.

2026-05-19 radiology and imaging 10.64898/2026.05.14.26353239 medRxiv
Top 0.1%
5.0%
Show abstract

Background: Anemia is an established marker of cardiovascular disease severity and risk which leads to elevations in resting myocardial blood flow (MBF) and impaired myocardial flow reserve (MFR) in patients without obstructive coronary artery disease (CAD). Anemia can potentially be detected opportunistically from blood pool density changes on computed tomography (CT) imaging. Objectives: We evaluated relationships between chamber density measurements with hemoglobin, positron emission tomography (PET) findings, and cardiovascular events. Methods: We included 33460 patients from 13 sites in the REFINE-PET who underwent PET and 24368 patients undergoing lung cancer screening chest CT. A deep learning model segmented cardiac chambers from CT images, then quantified chamber density. We evaluated the relationship between chamber density measures with resting MBF and MFR, as well as associations with death or myocardial infarction (MI). Results: We included a total of 57,828 patients. A higher density in myocardium compared to left ventricle blood pool was associated with reduced MFR (adjusted odds ratio 3.02 per SD increase, 95% confidence interval[CI] 2.72 - 3.38) and an increased risk of death or MI in (adjusted hazard ratio[HR] 1.38 per SD increase, 95% CI 1.26-1.51). Having myocardial density higher than blood pool density was also associated with cardiovascular death in patients undergoing low-dose chest CT (adjusted HR 1.73, 95% CI 1.20-2.52). Conclusions: In a large multimodality dataset, lower cardiac chamber density is associated with impaired MFR and independently associated with cardiovascular events. These biomarkers can be automatically extracted from CT to provide physiologic insights and potentially guide patient care.

15
Comparative Study on Image Quality of Deep Learning and Adaptive Statistical Iterative Reconstruction-V in Thin Layer CT of liver Lesions

Yang, J.; Li, L.; Cao, J.; Zhang, J.

2026-05-26 radiology and imaging 10.64898/2026.05.23.26353923 medRxiv
Top 0.2%
4.8%
Show abstract

Objective:This study aims to compare the advantages and disadvantages of DLIR and adaptive statistical iterative reconstruction-V (ASIR-V) in thin-slice (2.5 mm) CT images of hepatic lesions characterized by high and low contrast. Additionally, the study seeks to determine the optimal DLIR strength for the evaluation of liver lesions. Methods:A retrospective analysis was performed on 90 patients who underwent abdominal contrast-enhanced CT scans. Group A comprised 48 patients with low-contrast lesions, while Group B included 42 patients with high-contrast lesions. The acquired images were reconstructed using post-processing DLIR at low (DLIR-L), medium (DLIR-M), and high (DLIR-H) strengths, all with a slice thickness of 2.5 mm (subgroups A1-A3, B1-B3). Furthermore, images were reconstructed with ASIR-V at 50% strength at slice thicknesses of 2.5 mm and 5 mm (subgroups A4/B4 and A5/B5, respectively). CT values and standard deviations (SD) of the liver and lesions were measured, and the corresponding signal-to-noise ratio (SNR) and contrast-to-noise ratio (CNR) were calculated. The edge rise slope (ERS) was determined using ImageJ software by measuring CT values along a line from the liver parenchyma to the lesion. Objective metrics were compared using one-way ANOVA, with independent samples t-tests applied for inter-group differences. Subjective scoring, which encompassed noise level, diagnostic confidence, and lesion margin delineation, was conducted by two radiologists, with differences analyzed using the Kappa test. Results: Objective evaluation revealed a progressive decrease in lesion SD and a progressive increase in SNR and CNR from subgroups A1/B1 to A3/B3. The SD of Group A2 decreased by 57.4% compared to A4, while the SNR and CNR of A2 icreased by 19.3% and 24.6% compared to A4. Although subgroup B2 had a lower SNR than B5, the difference was not statistically significant. SNR and CNR in B2 increased by 24.1% and 11.9%, respectively, compared to B4. ERS gradually decreased from A1/B1 to A3/B3. ERS values in A2 and B2 increased by 27.0% and 39.4%, respectively, relative to A5 and B5. Although A3 had a lower ERS than A1 and A2, all DLIR subgroups exhibited higher ERS than A5; similar trends were observed in Group B. Subjective evaluation indicated good inter-reader agreement (Kappa > 0.61, p < 0.05). As DLIR strength increased, noise scores rose progressively in both groups. However, noise in A2 and B2 was lower than in A4/A5 and B4/B5. Diagnostic confidence and lesion margin delineation scores were highest in A2 and B2, while all subjective scores were lowest in A5 and B5. Discussion: Most prior studies evaluated the liver, vessels, or confirmed that image quality can be guaranteed at low doses. However, there are few studies on specific individual lesions. Therefore, this study aims to investigate specific individual lesions. The details and detection rate were analyzed separately to confirm the clinical acceptability of 2.5-mm DLIR image in different contrast lesions. Conclusion: For both high- and low-contrast hepatic lesions, DLIR provides superior image quality compared to ASIR-V, with the 2.5mm DLIR-M setting being optimal. DLIR-M reduces image noise, improves spatial resolution, and produces images more suitable for diagnostic purposes.

16
MRI-Based Pressure Gradient Mapping in Patient-Specific Models of Coarctation of the Aorta

Nair, P.; Ferrari, L.; Loecher, M.; McGrath, C. M.; Castillo Passi, C. A.; Marsden, A. L.; Ennis, D. B.

2026-06-03 radiology and imaging 10.64898/2026.05.27.26353898 medRxiv
Top 0.2%
4.0%
Show abstract

Purpose: Accurate assessment of the pressure gradient ({Delta}P) across aortic coarctation (CoA) is critical for determining disease severity and the need for intervention. Current non-invasive methods are unreliable, while invasive catheterization remains the clinical gold standard. This study evaluates a novel MRI acquisition strategy, 4D-FlowP, that simultaneously encodes blood velocity and acceleration to enable reliable non-invasive pressure gradient mapping in CoA. Methods: Patient-specific compliant aortic phantoms were created from clinical MRI data of two patients with CoA. Additional geometries were synthetically generated by increasing stenosis severity. Phantoms were studied in an MRI compatible flow loop under physiologically realistic flow and pressure conditions. Pressure gradients were estimated using conventional 4D-Flow MRI, 4D-FlowP, and fluid-structure interaction (FSI) simulations. Results were compared against ground-truth catheter-based measurements across multiple flow rates and stenosis severities. Results: Conventional 4D-Flow consistently underestimated {Delta}P (slope = 0.63, R2=0.75) relative to catheter measurements. In contrast, 4D-FlowP demonstrated substantially improved agreement (slope = 0.95, R2=0.75). FSI simulations showed the highest overall agreement with catheter-derived {Delta}P (slope = 1.14, R2=0.82). Scan times for 4D-FlowP were comparable to 4D-Flow (26 vs. 24 minutes). Conclusion: 4D-FlowP enables a more accurate MRI-based pressure gradient mapping in CoA than conventional 4D-Flow, when compared to ground truth catheter measurements. These findings support further in vivo evaluation of 4D-FlowP as a non-invasive alternative for functional assessment of CoA severity

17
DISCERN: A Clinical Impact-aware Framework for Radiology Report Comparison

Sharma, R.; Beeche, C.; Dong, J.; Zhuang, R.; Qu, H.; Zhang, R.; Gangaram, V.; Goswami, P.; Xin, J.; Ballard, J.; Goldberg, A.; Sagreiya, H.; Long, Q.; Chen, T.; Witschey, W. R.

2026-05-27 radiology and imaging 10.64898/2026.05.26.26353612 medRxiv
Top 0.2%
4.0%
Show abstract

The surge in medical imaging has spurred the development of vision-language models (VLMs) to alleviate radiologist workloads. However, clinical deployment is hindered by the lack of meaningful evaluation frameworks. Current metrics - ranging from semantic similarity to large language model (LLM) based judges - often fail to distinguish between clinically trivial and critical discrepancies, poorly reflecting real-world clinical judgment. To address this, we introduce DISCERN (Discordance and Significance-aware Entity-level Radiology Report Comparison). DISCERN is a significance-aware framework that weighs report errors based on their potential impact on patient care. Our results demonstrate that DISCERN powered by closed source LLMs aligns more closely with expert radiologist assessments than traditional metrics or current LLM evaluators, providing a more interpretable and clinically relevant benchmark. By modeling radiologist prioritization and entity-level feedback, DISCERN facilitates targeted model refinement and ensures the safer integration of generative AI into clinical workflows.

18
Scan length as a major driver of CT radiation dose: a diagnostic reference level audit from Kosovo

Rudi, G.; Vula, F.; Bicaku, A.; Dedushi, K.; Ahmetgjekaj, I.

2026-05-17 radiology and imaging 10.64898/2026.05.12.26353024 medRxiv
Top 0.2%
3.7%
Show abstract

Computed tomography is the largest contributor to population radiation dose from medical imaging, yet no diagnostic reference levels (DRLs) have been published from Kosovo or the Western Balkans. This retrospective audit analyzed all CT examinations performed on a 128- slice scanner at the University Clinical Centre of Kosovo between January and March 2026. After exclusions, 1,535 acquisitions from 1,092 patients across nine examination categories were analyzed. Local DRLs were defined as the 75th percentile and compared against German (BfS 2022) and Turkish (Kahraman et al., 2024) reference values. Head CT (n = 590) demonstrated CTDIvol 4.7% below the BfS DRL yet scan length 98.5% above the orientation value (median 25.8 vs 13 cm). Abdomen-pelvis CTDIvol matched the BfS reference while scan length exceeded it by 28%. Coronary CTA showed CTDIvol +377%, consistent with retrospective ECG gating. Excess scan length, not CTDIvol, is the major driver of elevated dose at this institution. The identified excesses are correctable through technologist landmarking training, protocol review, and enabling iterative reconstruction.

19
MR-Guided PET Denoising and Resolution Enhancement Improves Visual Interpretation and Preserves Quantitative Behavior Across Amyloid Tracers

Szujewski, C.; Shepherd, T. M.; Ghesani, M.; Ponisio, M.; Lavely, W.; Schramm, G.; Bollack, A.; Ades-aron, B.; Lemberskiy, G.

2026-05-19 radiology and imaging 10.64898/2026.05.14.26353149 medRxiv
Top 0.2%
3.6%
Show abstract

Background: Amyloid-beta PET provides critical biomarker data for Alzheimer's disease diagnosis and anti-amyloid therapy evaluation, yet low spatial resolution and partial volume effects result in decreased interpretability, particularly in cases with low or borderline cortical amyloid burden. While quantitative metrics (SUVr, Centiloid) aid in interpretation of amyloid burden, disagreement between visual reads and quantitative burden does occur, further blurring the line between positive or negative scans. We evaluated whether a vendor-neutral MR-guided PET denoising and resolution enhancement method (MRG) that uses Bowsher regularization improves image interpretability and reader performance while preserving established quantitative biomarkers across multiple amyloid tracers, leading to increased concordance among visual reads and quantitative metrics. Methods: Standard (STN) and MRG PET images were compared for four tracers ([18F]AV-45 ([18F]florbetapir, FBP), [18F]florbetaben (FBB), [18F]flutemetamol (FMM), and [11C]Pittsburgh compound-B (PiB) collectively from 24 MRI and 33 PET scanners. Quantitative equivalence was assessed by comparing Standardized Uptake Value ratio (SUVr) and Centiloid scores. In three of the four tracers (FBP, FBB, FMM), visual-quantitative concordance (AUC) and reader performance were evaluated in a blinded multi-reader study by four highly experienced brain PET readers who assessed image quality, artifact severity, reader confidence, and binary amyloid positivity. Results: Across all tracers, MRG preserved quantitative SUVr and Centiloid metrics relative to STN (R2 >0.90 for all tracers) without introducing bias to the SUVr metric. Concordance between visual reads and quantitative burden measures significantly improved with MRG. In the multi-reader study, MRG resulted in significantly higher image quality, lower artifact burden, and greater reader confidence compared to STN (p < 0.0001). Reader accuracy increased from 0.89 to 0.94, and the false-negative rate decreased from 0.08 to 0.04. Crucially, improvements in reader confidence, accuracy, and the reduction in false negative reads were most pronounced in cases with low amyloid burden near the threshold of visual positivity. Conclusions: MRG denoising and resolution enhancement improved perceived image quality, reader confidence, and accuracy for amyloid PET while preserving standard quantitative behavior across tracers. By improving cortical definition in visually challenging low-burden cases without disrupting established SUVr/Centiloid behavior, MRG may reduce visual-quantitative discordance and support more confident amyloid PET interpretation near the threshold of positivity.

20
A Radiologic Masquerade: Camrelizumab-Associated Breast Lesions That Mimic Progression

Hu, Y.; Shui, Y.; Li, W.; Liang, J.; Song, Y.; Wang, M.; Zhang, F.; Zhang, M.; Wang, H.; Ji, L.; Li, M.; Wang, C.; Shao, N.; Kuang, X.; He, S.; Zhang, X.

2026-06-03 radiology and imaging 10.64898/2026.05.30.26353749 medRxiv
Top 0.3%
2.8%
Show abstract

Abstract Background Immune-related adverse events (irAEs) involving the breast remain rarely reported. Purpose To characterize clinical and imaging features of camrelizumab-associated breast lesions (CABLs). Materials and Methods This retrospective dual cohort study (October 2019 to February 2026) included 196 female patients. Cohort A comprised 180 non-breast cancer patients; Cohort B comprised 16 breast cancer patients receiving neoadjuvant camrelizumab. Baseline characteristics, treatment response, and CT/MRI features were compared between CABL-positive and CABL-negative groups using Mann-Whitney U and chi-square tests. Results CABLs developed in 34.4% (62/180) of Cohort A and 93.8% (15/16) of Cohort B. CABL-positive patients were younger (median 50.5 vs 54.5 years; P = 0.006) and more often premenopausal (46.8% vs 26.3%; P = 0.009). The objective response rate was relatively high among patients with positive lesions; in Group A, the disease progression rate was lower in the CABL-positive group than in the CABL-negative group (3.2% vs 17.8%), whilst in Group B, the pathological complete response rate was as high as 53.3% (8/15). On CT/MRI, CABLs were predominantly multiple (62.5%), with well-defined margins and unrestricted diffusion. The predominant time-intensity curve (TIC) pattern was washout (46.7%). Median time to onset was 2-3 cycles (the second MRI scan); most lesions disappeared (40.3%) and shrank (46.8%) during follow-up. ADC values of lesions were significantly higher than those of primary tumors (1.847+/-0.284 vs 0.976+/-0.055 x10[-3] mm[2]/s; P < 0.001). Histopathology of four lesions revealed lymphocytic infiltration and fibrosis without malignancy. Conclusion CABLs are benign reactive changes driven by multiple factors. Their recognition prevents misinterpretation as disease progression, thereby avoiding unnecessary treatment discontinuation or biopsy.